Anomaly Detection for Astronomical Data
نویسندگان
چکیده
Modern astronomical observatories can produce massive amount of data that are beyond the capability of the researchers to even take a glance. These scientific observations present both great opportunities and challenges for astronomers and machine learning researchers. In this project we address the problem of detecting anomalies/novelties in these large-scale astronomical data sets. Two types of anomalies, the point anomalies and the group anomalies, are considered. The point anomalies include individual anomalous objects, such as single stars or galaxies that present unique characteristics. The group anomalies include anomalous groups of objects, such as unusual clusters of the galaxies that are close together. They both have great values for astronomical studies, and our goal is to detect them automatically in un-supervised ways. For point anomalies, we adopt the subspace-based detection strategy and proposed a robust low-rank matrix decomposition algorithm for more reliable results. For group anomalies, we use hierarchical probabilistic models to capture the generative mechanism of the data, and then score the data groups using various probability measures. Experimental evaluation on both synthetic and real world data sets shows the effectiveness of the proposed methods. On a real astronomical data sets, we obtained several interesting anecdotal results. Initial inspections by the astronomers confirm the usefulness of these machine learning methods in astronomical research.
منابع مشابه
Hierarchical Probabilistic Models for Group Anomaly Detection
Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose generative models for detecting such group anomalies. We evaluate our methods on synthetic data ...
متن کاملNonparametric Divergence Estimation with Applications to Machine Learning on Distributions
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution....
متن کاملar X iv : g r - qc / 0 60 40 52 v 1 1 1 A pr 2 00 6 Is the physics within the Solar system really understood ?
A collection is made of presently unexplained phenomena within our Solar system and in the universe. These phenomena are (i) the Pioneer anomaly, (ii) the flyby anomaly, (iii) the increase of the Astronomical Unit, (iv) the quadrupole and octupole anomaly, and (v) Dark Energy and (vi) Dark Matter. A new data analysis of the complete set of Pioneer data is announced in order to search for system...
متن کاملMoving dispersion method for statistical anomaly detection in intrusion detection systems
A unified method for statistical anomaly detection in intrusion detection systems is theoretically introduced. It is based on estimating a dispersion measure of numerical or symbolic data on successive moving windows in time and finding the times when a relative change of the dispersion measure is significant. Appropriate dispersion measures, relative differences, moving windows, as well as tec...
متن کاملBehavior-Based Online Anomaly Detection for a Nationwide Short Message Service
As fraudsters understand the time window and act fast, real-time fraud management systems becomes necessary in Telecommunication Industry. In this work, by analyzing traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provides a vast amount of usage data. Thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010